Skip to content

Project Swiss Cheese#4315

Closed
mmatczuk wants to merge 4 commits intomainfrom
mmt/project_swiss_chease
Closed

Project Swiss Cheese#4315
mmatczuk wants to merge 4 commits intomainfrom
mmt/project_swiss_chease

Conversation

@mmatczuk
Copy link
Copy Markdown
Contributor

cassandra

  • avoid gocql host-lookup against docker internal IP

#4296

cockroachdb

  • fix data race between Read() and closeConnection()
  • tolerate context.Canceled in test stream goroutine

#4297

confluent

  • fix data race on shutSig in schema registry encoder

#4298

elasticsearch

  • tolerate context.Canceled in v8 test stream goroutine
  • tolerate context.Canceled in v9 test stream goroutine

#4299

kafka

  • fix data race on msgChan in connectExplicitTopics
  • adapt soak test duration to test deadline
  • use 127.0.0.1 instead of localhost in test broker address
  • unnest range_of_partitions from explicit_partitions test group
  • use lighter suite for partition-specific test groups
  • reduce partition cache ordering test batch count
  • use lighter suite for unordered partition-specific test groups
  • parallelize manual_partitioner subtest and raise package timeout
  • use lighter suite for unordered SASL test
  • shorten soak test default duration to 1 minute

#4301

migrator

  • tolerate context.Canceled in consumer group test stream goroutine

#4302

mongodb

  • fix data race on ChangeStream between TryNext and ResumeToken
  • fix ResumeWithSnapshot test expecting error on clean shutdown
  • tolerate context.Canceled in RunAsync helper
  • reduce sleep times in ResumeAfterSnapshotWithoutChanges test
  • handle null fullDocument in update_lookup mode
  • increase cdc package integration test timeout to 15m
  • check shutdown signal in readChan sends to prevent hang
  • reduce parallel snapshot test data from 1M to 100k documents
  • retry Ping to tolerate replica set startup lag

#4303

mssqlserver

  • guard Connect against concurrent calls
  • fix flaky CDC streaming tests under x86 emulation
  • fix flaky CDC checkpoint resume test under x86 emulation
  • tolerate context.Canceled in resume test stream goroutine
  • store checkpoint LSN as varbinary
  • synchronise ordering test stream goroutine

#4304

mysql

  • check shutdown signal in onMessage to prevent deadlock
  • ignore expected context.Canceled from stream Run in test goroutine
  • ignore expected context.Canceled from stream Run in DDL test goroutine
  • tolerate context.Canceled in resume test stream goroutine
  • tolerate context.Canceled in composite primary keys test stream goroutine

#4305

ollama

  • set seed and temperature for deterministic test output

#4306

oracledb

  • increase integration test timeout from 10m to 30m
  • handle LOB_TRIM events to establish LOB locator state
  • use background context for deferred EndSession cleanup
  • tolerate missing checkpoint cache table in CDC LOB toggle test
  • guard nil stream in CDC LOB toggle test cleanup
  • avoid data race in snapshot transaction rollback
  • do not hardcode IDENTITY column values in streaming tests
  • infer LOB locator for BASICFILE out-of-line LOBs
  • infer LOB locator for BASICFILE out-of-line LOB INSERTs

#4307

postgresql

  • fix data race in Stream.Stop() on pgConn
  • filter context.Canceled from streamOut.Run in test goroutine
  • ignore expected context.Canceled from stream Run in test goroutine
  • tolerate context.Canceled in heartbeat test stream goroutine
  • tolerate context.Canceled in txn markers test stream goroutine

#4308

pulsar

  • use native ARM image to fix container startup timeouts
  • disable StreamTestAtLeastOnceDelivery due to upstream data race

#4309

redis

  • increase pubsub channel buffer to prevent message drops

#4310

redpanda

  • fix data race on TestingOnSetSubjectMode callback
  • fix flaky chaos tests after container restart

#4311

redpanda_migrator

  • fix data race on groupsMigrator admin clients

#4312

redpandatest

  • retry broker container startup on readiness timeout

#4313

sql

  • fix data race in TestIntegrationCosmosDB

#4314

Benthos CheckSkip guard requires test.run to be non-empty and match
the test name. With --unit we were omitting -run entirely, causing
all integration tests to skip. Use -run ^Test instead to match all
test functions while satisfying the guard.
…e repair

Two-stage pipeline that uses Claude agents to triage and fix integration test
failures automatically.

The triage agent (sonnet) classifies each failure as test_infra or code_bug,
tracks it in Jira under CON-381 with structured output. The fix agent (opus)
operates in an isolated git worktree per package, applies targeted fixes with
one commit per issue, then cherry-picks results back to the current branch.
Cherry-picks are mutex-serialized so concurrent agents don't step on each other.

Two entry points:
- run --fix [--fix-max-parallel N]: dispatches triage+fix in the background as
  packages fail, waits for all agents before exiting.
- fix <output-file.txt>: standalone mode for re-triaging a previously captured
  test output file.

Other changes:
- Worktree recovery on startup picks up commits from prior interrupted runs.
- splitFlagsAndArgs allows interspersed flags and positional filters
  (e.g. "run --fix kafka --debug").
- All agent activity logged to shared agents.log with per-package prefixes.
  Triage cost/duration always logged since these are real dollars.
- Fixed JSON schema mismatch (failures vs issues) that would silently
  produce empty triage results.
- Fixed file handle leak in triage output error path.
Adds --loop N flag to repeat runs until N successful clean iterations,
retrying in-place while fix agents may apply repairs. Packages currently
being fixed are postponed within an iteration instead of running against
stale code, and mgr.Wait() now runs on every iteration (not only on
failure). Adds Manager.IsFixing for the postpone check.
Add confluent, cypher, iceberg/integration, mssqlserver/replication,
ollama, oracledb/replication, otlp, postgresql/pglogicalstream,
redpanda, and spicedb to the integration test package list.

Document skipped packages (cohere, cyborgdb, tigerbeetle, zeromq)
inline with a "skip" reason field. The loader filters them out
automatically.
@mmatczuk mmatczuk changed the title Project Swiss Chease Project Swiss Cheese Apr 20, 2026
@mmatczuk
Copy link
Copy Markdown
Contributor Author

Re-creating with correct spelling and stats.

@mmatczuk mmatczuk closed this Apr 20, 2026
@mmatczuk mmatczuk deleted the mmt/project_swiss_chease branch April 20, 2026 14:09
@claude
Copy link
Copy Markdown

claude Bot commented Apr 20, 2026

Commits
LGTM

Review
Tooling-only PR under cmd/tools/integration (no component code). No high-signal issues found.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant